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Abstract 


This note describes a proposed addition of ECN (Explicit Congestion 
Notification) to IP. TCP is currently the dominant transport 
protocol used in the Internet. We begin by describing TCP’s use of 
packet drops as an indication of congestion. Next we argue that with 
the addition of active queue management (e.g., RED) to the Internet 
infrastructure, where routers detect congestion before the queue 
overflows, routers are no longer limited to packet drops as an 
indication of congestion. Routers could instead set a Congestion 
Experienced (CE) bit in the packet header of packets from ECN-capable 
transport protocols. We describe when the CE bit would be set in the 
routers, and describe what modifications would be needed to TCP to 
make it ECN-capable. Modifications to other transport protocols 
(e.g., unreliable unicast or multicast, reliable multicast, other 
reliable unicast transport protocols) could be considered as those 
protocols are developed and advance through the standards process. 


1. Conventions and Acronyms 
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 


SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this 
document, are to be interpreted as described in [B97]. 
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2. Introduction 


TCP’s congestion control and avoidance algorithms are based on the 
notion that the network is a black-box [Jacobson88, Jacobson90]. The 
network’s state of congestion or otherwise is determined by end- 
systems probing for the network state, by gradually increasing the 
load on the network (by increasing the window of packets that are 
outstanding in the network) until the network becomes congested and a 
packet is lost. Treating the network as a "black-box" and treating 
loss as an indication of congestion in the network is appropriate for 
pure best-effort data carried by TCP which has little or no 
sensitivity to delay or loss of individual packets. In addition, 
TCP’s congestion management algorithms have techniques built-in (such 
as Fast Retransmit and Fast Recovery) to minimize the impact of 
losses from a throughput perspective. 


However, these mechanisms are not intended to help applications that 
are in fact sensitive to the delay or loss of one or more individual 
packets. Interactive traffic such as telnet, web-browsing, and 
transfer of audio and video data can be sensitive to packet losses 
(using an unreliable data delivery transport such as UDP) or to the 
increased latency of the packet caused by the need to retransmit the 
packet after a loss (for reliable data delivery such as TCP). 


Since TCP determines the appropriate congestion window to use by 
gradually increasing the window size until it experiences a dropped 
packet, this causes the queues at the bottleneck router to build up. 
With most packet drop policies at the router that are not sensitive 
to the load placed by each individual flow, this means that some of 
the packets of latency-sensitive flows are going to be dropped. 
Active queue management mechanisms detect congestion before the queue 
overflows, and provide an indication of this congestion to the end 
nodes. The advantages of active queue management are discussed in 
RFC 2309 [RFC2309]. Active queue management avoids some of the bad 
properties of dropping on queue overflow, including the undesirable 
synchronization of loss across multiple flows. More importantly, 
active queue management means that transport protocols with 
congestion control (e.g., TCP) do not have to rely on buffer overflow 
as the only indication of congestion. This can reduce unnecessary 
queueing delay for all traffic sharing that queue. 


Active queue management mechanisms may use one of several methods for 
indicating congestion to end-nodes. One is to use packet drops, as is 
currently done. However, active queue management allows the router to 
separate policies of queueing or dropping packets from the policies 
for indicating congestion. Thus, active queue management allows 


Ramakrishnan & Floyd Experimental [Page 2] 


RFC 2481 ECN to IP January 1999 


routers to use the Congestion Experienced (CE) bit in a packet header 
as an indication of congestion, instead of relying solely on packet 
drops. 


3. Assumptions and General Principles 


In this section, we describe some of the important design principles 
and assumptions that guided the design choices in this proposal. 


(1) Congestion may persist over different time-scales. The time 
scales that we are concerned with are congestion events that may 
last longer than a round-trip time. 

(2) The number of packets in an individual flow (e.g., TCP connection 
or an exchange using UDP) may range from a small number of 
packets to quite a large number. We are interested in managing 
the congestion caused by flows that send enough packets so that 
they are still active when network feedback reaches them. 

(3) New mechanisms for congestion control and avoidance need to co- 
exist and cooperate with existing mechanisms for congestion 
control. In particular, new mechanisms have to co-exist with 
TCP’s current methods of adapting to congestion and with routers’ 
current practice of dropping packets in periods of congestion. 

(4) Because ECN is likely to be adopted gradually, accommodating 
migration is essential. Some routers may still only drop packets 
to indicate congestion, and some end-systems may not be ECN- 
capable. The most viable strategy is one that accommodates 
incremental deployment without having to resort to "islands" of 
ECN-capable and non-ECN-capable environments. 

(5) Asymmetric routing is likely to be a normal occurrence in the 
Internet. The path (sequence of links and routers) followed by 
data packets may be different from the path followed by the 
acknowledgment packets in the reverse direction. 

(6) Many routers process the "regular" headers in IP packets more 
efficiently than they process the header information in IP 
options. This suggests keeping congestion experienced 
information in the regular headers of an IP packet. 

(7) It must be recognized that not all end-systems will cooperate in 
mechanisms for congestion control. However, new mechanisms 
shouldn’t make it easier for TCP applications to disable TCP 
congestion control. The benefit of lying about participating in 
new mechanisms such as ECN-capability should be small. 


4. Random Early Detection (RED) 


Random Early Detection (RED) is a mechanism for active queue 
management that has been proposed to detect incipient congestion 
[FJ93], and is currently being deployed in the Internet backbone 
[RFC2309]. Although RED is meant to be a general mechanism using one 
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of several alternatives for congestion indication, in the current 
environment of the Internet RED is restricted to using packet drops 
as a mechanism for congestion indication. RED drops packets based on 
the average queue length exceeding a threshold, rather than only when 
the queue overflows. However, when RED drops packets before the 
queue actually overflows, RED is not forced by memory limitations to 
discard the packet. 


RED could set a Congestion Experienced (CE) bit in the packet header 
instead of dropping the packet, if such a bit was provided in the IP 
header and understood by the transport protocol. The use of the CE 
bit would allow the receiver(s) to receive the packet, avoiding the 
potential for excessive delays due to retransmissions after packet 
losses. We use the term ’CE packet’ to denote a packet that has the 
CE bit set. 


5. Explicit Congestion Notification in IP 


We propose that the Internet provide a congestion indication for 
incipient congestion (as in RED and earlier work [RJ90]) where the 
notification can sometimes be through marking packets rather than 
dropping them. This would require an ECN field in the IP header with 
two bits. The ECN-Capable Transport (ECT) bit would be set by the 
data sender to indicate that the end-points of the transport protocol 
are ECN-capable. The CE bit would be set by the router to indicate 
congestion to the end nodes. Routers that have a packet arriving at 
a full queue would drop the packet, just as they do now. 


Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field. 
Bit 6 is designated as the ECT bit, and bit 7 is designated as the CE 
bit. The IPv4 TOS octet corresponds to the Traffic Class octet in 
IPv6. The definitions for the IPv4 TOS octet [RFC791] and the IPv6 
Traffic Class octet are intended to be superseded by the DS 
(Differentiated Services) Field [DIFFSERV]. Bits 6 and 7 are listed 
in [DIFFSERV] as Currently Unused. Section 19 gives a brief history 
of the TOS octet. 


Because of the unstable history of the TOS octet, the use of the ECN 
field as specified in this document cannot be guaranteed to be 
backwards compatible with all past uses of these two bits. The 
potential dangers of this lack of backwards compatibility are 
discussed in Section 19. 


Upon the receipt by an ECN-Capable transport of a single CE packet, 
the congestion control algorithms followed at the end-systems MUST be 
essentially the same as the congestion control response to a *single* 
dropped packet. For example, for ECN-Capable TCP the source TCP is 
required to halve its congestion window for any window of data 
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containing either a packet drop or an ECN indication. However, we 
would like to point out some notable exceptions in the reaction of 
the source TCP, related to following the shorter-time-scale details 
of particular implementations of TCP. For TCP’s response to an ECN 
indication, we do not recommend such behavior as the slow-start of 
Tahoe TCP in response to a packet drop, or Reno TCP’s wait of roughly 
half a round-trip time during Fast Recovery. 


One reason for requiring that the congestion-control response to the 
CE packet be essentially the same as the response to a dropped packet 
is to accommodate the incremental deployment of ECN in both end- 
systems and in routers. Some routers may drop ECN-Capable packets 
(e.g., using the same RED policies for congestion detection) while 
other routers set the CE bit, for equivalent levels of congestion. 
Similarly, a router might drop a non-ECN-Capable packet but set the 
CE bit in an ECN-Capable packet, for equivalent levels of congestion. 
Different congestion control responses to a CE bit indication and to 
a packet drop could result in unfair treatment for different flows. 


An additional requirement is that the end-systems should react to 
congestion at most once per window of data (i.e., at most once per 
roundtrip time), to avoid reacting multiple times to multiple 
indications of congestion within a roundtrip time. 


For a router, the CE bit of an ECN-Capable packet should only be set 
if the router would otherwise have dropped the packet as an 
indication of congestion to the end nodes. When the router’s buffer 
is not yet full and the router is prepared to drop a packet to inform 
end nodes of incipient congestion, the router should first check to 
see if the ECT bit is set in that packet’s IP header. If so, then 
instead of dropping the packet, the router MAY instead set the CE bit 
in the IP header. 


An environment where all end nodes were ECN-Capable could allow new 
criteria to be developed for setting the CE bit, and new congestion 


control mechanisms for end-node reaction to CE packets. However, 
this is a research issue, and as such is not addressed in this 
document. 


= 


When a CE packet is received by a router, the CE bit is left 
unchanged, and the packet transmitted as usual. When severe 
congestion has occurred and the router’s queue is full, then the 
router has no choice but to drop some packet when a new packet 
arrives. We anticipate that such packet losses will become 
relatively infrequent when a majority of end-systems become ECN- 
Capable and participate in TCP or other compatible congestion control 
mechanisms. In an adequately-provisioned network in such an ECN- 
Capable environment, packet losses should occur primarily during 
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transients or in the presence of non-cooperating sources. 


We expect that routers will set the CE bit in response to incipient 
congestion as indicated by the average queue size, using the RED 
algorithms suggested in [FJ93, RFC2309]. To the best of our 
knowledge, this is the only proposal currently under discussion in 
the IETF for routers to drop packets proactively, before the buffer 
overflows. However, this document does not attempt to specify a 
particular mechanism for active queue management, leaving that 
endeavor, if needed, to other areas of the IETF. While ECN is 
inextricably tied up with active queue management at the router, the 
reverse does not hold; active queue management mechanisms have been 
developed and deployed independently from ECN, using packet drops as 
indications of congestion in the absence of ECN in the IP 
architecture. 


6. Support from the Transport Protocol 


ECN requires support from the transport protocol, in addition to the 
functionality given by the ECN field in the IP packet header. The 
transport protocol might require negotiation between the endpoints 
during setup to determine that all of the endpoints are ECN-capable, 
so that the sender can set the ECT bit in transmitted packets. 
Second, the transport protocol must be capable of reacting 
appropriately to the receipt of CE packets. This reaction could be 
in the form of the data receiver informing the data sender of the 
received CE packet (e.g., TCP), of the data receiver unsubscribing to 
a layered multicast group (e.g., RLM [MJV96]), or of some other 
action that ultimately reduces the arrival rate of that flow to that 
receiver. 


This document only addresses the addition of ECN Capability to TCP, 
leaving issues of ECN and other transport protocols to further 
research. For TCP, ECN requires three new mechanisms: negotiation 
between the endpoints during setup to determine if they are both 
ECN-capable; an ECN-Echo flag in the TCP header so that the data 
receiver can inform the data sender when a CE packet has been 
received; and a Congestion Window Reduced (CWR) flag in the TCP 
header so that the data sender can inform the data receiver that the 
congestion window has been reduced. The support required from other 
transport protocols is likely to be different, particular for 
unreliable or reliable multicast transport protocols, and will have 
to be determined as other transport protocols are brought to the IETF 
for standardization. 
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Gols: TCE 


The following sections describe in detail the proposed use of ECN in 
TCP. This proposal is described in essentially the same form in 
[Floyd94]. We assume that the source TCP uses the standard congestion 
control algorithms of Slow-start, Fast Retransmit and Fast Recovery 
[RFC 2001]. 


This proposal specifies two new flags in the Reserved field of the 
TCP header. The TCP mechanism for negotiating ECN-Capability uses 
the ECN-Echo flag in the TCP header. (This was called the ECN Notify 
flag in some earlier documents.) Bit 9 in the Reserved field of the 
TCP header is designated as the ECN-Echo flag. The location of the 
6-bit Reserved field in the TCP header is shown in Figure 3 of RFC 
793 [RFC793]. 


To enable the TCP receiver to determine when to stop setting the 
ECN-Echo flag, we introduce a second new flag in the TCP header, the 
Congestion Window Reduced (CWR) flag. The CWR flag is assigned to 
Bit 8 in the Reserved field of the TCP header. 


The use of these flags is described in the sections below. 
6.1.1. TCP Initialization 


In the TCP connection setup phase, the source and destination TCPs 
exchange information about their desire and/or capability to use ECN. 
Subsequent to the completion of this negotiation, the TCP sender sets 
the ECT bit in the IP header of data packets to indicate to the 
network that the transport is capable and willing to participate in 
ECN for this packet. This will indicate to the routers that they may 
mark this packet with the CE bit, if they would like to use that as a 
method of congestion notification. If the TCP connection does not 
wish to use ECN notification for a particular packet, the sending TCP 
sets the ECT bit equal to 0 (i.e., not set), and the TCP receiver 
ignores the CE bit in the received packet. 


When a node sends a TCP SYN packet, it may set the ECN-Echo and CWR 
flags in the TCP header. For a SYN packet, the setting of both the 
ECN-Echo and CWR flags are defined as an indication that the sending 
TCP is ECN-Capable, rather than as an indication of congestion or of 
response to congestion. More precisely, a SYN packet with both the 
ECN-Echo and CWR flags set indicates that the TCP implementation 
transmitting the SYN packet will participate in ECN as both a sender 
and receiver. As a receiver, it will respond to incoming data 
packets that have the CE bit set in the IP header by setting the 
ECN-Echo flag in outgoing TCP Acknowledgement (ACK) packets. As a 
sender, it will respond to incoming packets that have the ECN-Echo 
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flag set by reducing the congestion window when appropriate. 


When a node sends a SYN-ACK packet, it may set the ECN-Echo flag, but 
it does not set the CWR flag. For a SYN-ACK packet, the pattern of 
the ECN-Echo flag set and the CWR flag not set in the TCP header is 
defined as an indication that the TCP transmitting the SYN-ACK packet 
is ECN-Capable. 


There is the question of why we chose to have the TCP sending the SYN 
set two ECN-related flags in the Reserved field of the TCP header for 
the SYN packet, while the responding TCP sending the SYN-ACK sets 
only one ECN-related flag in the SYN-ACK packet. This asymmetry is 
necessary for the robust negotiation of ECN-capability with deployed 
TCP implementations. There exists at least one TCP implementation in 
which TCP receivers set the Reserved field of the TCP header in ACK 
packets (and hence the SYN-ACK) simply to reflect the Reserved field 
of the TCP header in the received data packet. Because the TCP SYN 
packet sets the ECN-Echo and CWR flags to indicate ECN-capability, 
while the SYN-ACK packet sets only the ECN-Echo flag, the sending TCP 
correctly interprets a receiver’s reflection of its own flags in the 
Reserved field as an indication that the receiver is not ECN-capable. 


6.1.2. The TCP Sender 


For a TCP connection using ECN, data packets are transmitted with the 
ECT bit set in the IP header (set to a "1"). If the sender receives 
an ECN-Echo ACK packet (that is, an ACK packet with the ECN-Echo flag 
set in the TCP header), then the sender knows that congestion was 
encountered in the network on the path from the sender to the 
receiver. The indication of congestion should be treated just as a 
congestion loss in non-ECN-Capable TCP. That is, the TCP source 
halves the congestion window "cwnd" and reduces the slow start 


threshold "ssthresh". The sending TCP does NOT increase the 
congestion window in response to the receipt of an ECN-Echo ACK 
packet. 


A critical condition is that TCP does not react to congestion 
indications more than once every window of data (or more loosely, 
more than once every round-trip time). That is, the TCP sender’s 
congestion window should be reduced only once in response to a series 
of dropped and/or CE packets from a single window of data, In 
addition, the TCP source should not decrease the slow-start 
threshold, ssthresh, if it has been decreased within the last round 
trip time. However, if any retransmitted packets are dropped or have 
the CE bit set, then this is interpreted by the source TCP as a new 
instance of congestion. 
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After the source TCP reduces its congestion window in response to a 
CE packet, incoming acknowledgements that continue to arrive can 
"clock out" outgoing packets as allowed by the reduced congestion 
window. If the congestion window consists of only one MSS (maximum 
segment size), and the sending TCP receives an ECN-Echo ACK packet, 
then the sending TCP should in principle still reduce its congestion 
window in half. However, the value of the congestion window is 
bounded below by a value of one MSS. If the sending TCP were to 
continue to send, using a congestion window of 1 MSS, this results in 
the transmission of one packet per round-trip time. We believe it is 
desirable to still reduce the sending rate of the TCP sender even 
further, on receipt of an ECN-Echo packet when the congestion window 
is one. We use the retransmit timer as a means to reduce the rate 
further in this circumstance. Therefore, the sending TCP should also 
reset the retransmit timer on receiving the ECN-Echo packet when the 
congestion window is one. The sending TCP will then be able to send 
a new packet when the retransmit timer expires. 


[Floyd94] discusses TCP’s response to ECN in more detail. [Floyd98] 
discusses the validation test in the ns simulator, which illustrates 
a wide range of ECN scenarios. These scenarios include the following: 
an ECN followed by another ECN, a Fast Retransmit, or a Retransmit 
Timeout; a Retransmit Timeout or a Fast Retransmit followed by an 
ECN, and a congestion window of one packet followed by an ECN. 


TCP follows existing algorithms for sending data packets in response 
to incoming ACKs, multiple duplicate acknowledgements, or retransmit 
timeouts [RFC2001]. 


6.1.3. The TCP Receiver 


When TCP receives a CE data packet at the destination end-system, the 
TCP data receiver sets the ECN-Echo flag in the TCP header of the 
subsequent ACK packet. If there is any ACK withholding implemented, 
as in current "delayed-ACK" TCP implementations where the TCP 
receiver can send an ACK for two arriving data packets, then the 
ECN-Echo flag in the ACK packet will be set to the OR of the CE bits 
of all of the data packets being acknowledged. That is, if any of 
the received data packets are CE packets, then the returning ACK has 
the ECN-Echo flag set. 


To provide robustness against the possibility of a dropped ACK packet 
carrying an ECN-Echo flag, the TCP receiver must set the ECN-Echo 
flag in a series of ACK packets. The TCP receiver uses the CWR flag 
to determine when to stop setting the ECN-Echo flag. 
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When an ECN-Capable TCP reduces its congestion window for any reason 
(because of a retransmit timeout, a Fast Retransmit, or in response 
to an ECN Notification), the TCP sets the CWR flag in the TCP header 
of the first data packet sent after the window reduction. If that 
data packet is dropped in the network, then the sending TCP will have 
to reduce the congestion window again and retransmit the dropped 
packet. Thus, the Congestion Window Reduced message is reliably 
delivered to the data receiver. 


After a TCP receiver sends an ACK packet with the ECN-Echo bit set, 
that TCP receiver continues to set the ECN-Echo flag in ACK packets 
until it receives a CWR packet (a packet with the CWR flag set). 
After the receipt of the CWR packet, acknowledgements for subsequent 
non-CE data packets do not have the ECN-Echo flag set. If another CE 
packet is received by the data receiver, the receiver would once 
again send ACK packets with the ECN-Echo flag set. While the receipt 
of a CWR packet does not guarantee that the data sender received the 
ECN-Echo message, this does indicate that the data sender reduced its 
congestion window at some point *after* it sent the data packet for 
which the CE bit was set. 


We have already specified that a TCP sender reduces its congestion 
window at most once per window of data. This mechanism requires some 
care to make sure that the sender reduces its congestion window at 
most once per ECN indication, and that multiple ECN messages over 
several successive windows of data are properly reported to the ECN 
sender. This is discussed further in [Floyd98]. 


6.1.4. Congestion on the ACK-path 


For the current generation of TCP congestion control algorithms, pure 
acknowledgement packets (e.g., packets that do not contain any 
accompanying data) should be sent with the ECT bit off. Current TCP 
receivers have no mechanisms for reducing traffic on the ACK-path in 
response to congestion notification. Mechanisms for responding to 
congestion on the ACK-path are areas for current and future research. 
(One simple possibility would be for the sender to reduce its 
congestion window when it receives a pure ACK packet with the CE bit 
set). For current TCP implementations, a single dropped ACK generally 
has only a very small effect on the TCP’s sending rate. 


7. Summary of changes required in IP and TCP 
Two bits need to be specified in the IP header, the ECN-Capable 


Transport (ECT) bit and the Congestion Experienced (CE) bit. The ECT 
bit set to "0" indicates that the transport protocol will ignore the 
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CE bit. This is the default value for the ECT bit. The ECT bit set 
to "1" indicates that the transport protocol is willing and able to 
participate in ECN. 


The default value for the CE bit is "0". The router sets the CE bit 
to "1" to indicate congestion to the end nodes. The CE bit ina 
packet header should never be reset by a router from "1" to "0". 


TCP requires three changes, a negotiation phase during setup to 
determine if both end nodes are ECN-capable, and two new flags in the 
TCP header, from the "reserved" flags in the TCP flags field. The 
ECN-Echo flag is used by the data receiver to inform the data sender 
of a received CE packet. The Congestion Window Reduced flag is used 
by the data sender to inform the data receiver that the congestion 


window has been reduced. 


8. Non-relationship to ATM’s EFCI indicator or Frame Relay’s FECN 


Since the ATM and Frame Relay mechanisms for congestion indication 
have typically been defined without any notion of average queue size 
as the basis for determining that an intermediate node is congested, 
we believe that they provide a very noisy signal. The TCP-sender 
reaction specified in this draft for ECN is NOT the appropriate 
reaction for such a noisy signal of congestion notification. It is 
our expectation that ATM’s EFCI and Frame Relay’s FECN mechanisms 
would be phased out over time within the ATM network. However, if 
the routers that interface to the ATM network have a way of 
maintaining the average queue at the interface, and use it to come to 
a reliable determination that the ATM subnet is congested, they may 
use the ECN notification that is defined here. 


We emphasize that a *single* packet with the CE bit set in an IP 
packet causes the transport layer to respond, in terms of congestion 
control, as it would to a packet drop. As such, the CE bit is not a 
good match to a transient signal such as one based on the 
instantaneous queue size. However, experiments in techniques at 
layer 2 (e.g., in ATM switches or Frame Relay switches) should be 
encouraged. For example, using a scheme such as RED (where packet 
marking is based on the average queue length exceeding a threshold), 
layer 2 devices could provide a reasonably reliable indication of 
congestion. When all the layer 2 devices in a path set that layer’s 
own Congestion Experienced bit (e.g., the EFCI bit for ATM, the FECN 
bit in Frame Relay) in this reliable manner, then the interface 
router to the layer 2 network could copy the state of that layer 2 
Congestion Experienced bit into the CE bit in the IP header. We 
recognize that this is not the current practice, nor is it in current 
standards. However, encouraging experimentation in this manner may 
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provide the information needed to enable evolution of existing layer 
2 mechanisms to provide a more reliable means of congestion 
indication, when they use a single bit for indicating congestion. 


9. Non-compliance by the End Nodes 


This section discusses concerns about the vulnerability of ECN to 
non-compliant end-nodes (i.e., end nodes that set the ECT bit in 
transmitted packets but do not respond to received CE packets). We 
argue that the addition of ECN to the IP architecture would not 
significantly increase the current vulnerability of the architecture 
to unresponsive flows. 


Even for non-ECN environments, there are serious concerns about the 
damage that can be done by non-compliant or unresponsive flows (that 
is, flows that do not respond to congestion control indications by 
reducing their arrival rate at the congested link). For example, an 
end-node could "turn off congestion control" by not reducing its 
congestion window in response to packet drops. This is a concern for 


the current Internet. It has been argued that routers will have to 
deploy mechanisms to detect and differentially treat packets from 
non-compliant flows. It has also been argued that techniques such as 


end-to-end per-flow scheduling and isolation of one flow from 
another, differentiated services, or end-to-end reservations could 
remove some of the more damaging effects of unresponsive flows. 


It has been argued that dropping packets in itself may be an adequate 
deterrent for non-compliance, and that the use of ECN removes this 
deterrent. We would argue in response that (1) ECN-capable routers 
preserve packet-dropping behavior in times of high congestion; and 
(2) even in times of high congestion, dropping packets in itself is 
not an adequate deterrent for non-compliance. 


First, ECN-Capable routers will only mark packets (as opposed to 
dropping them) when the packet marking rate is reasonably low. During 
periods where the average queue size exceeds an upper threshold, and 
therefore the potential packet marking rate would be high, our 
recommendation is that routers drop packets rather then set the C 
bit in packet headers. 


ti 


During the periods of low or moderate packet marking rates when ECN 
would be deployed, there would be little deterrent effect on 
unresponsive flows of dropping rather than marking those packets. For 
example, delay-insensitive flows using reliable delivery might have 
an incentive to increase rather than to decrease their sending rate 
in the presence of dropped packets. Similarly, delay-sensitive flows 
using unreliable delivery might increase their use of FEC in response 
to an increased packet drop rate, increasing rather than decreasing 
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10. 


their sending rate. For the same reasons, we do not believe that 
packet dropping itself is an effective deterrent for non-compliance 
even in an environment of high packet drop rates. 


Several methods have been proposed to identify and restrict non- 
compliant or unresponsive flows. The addition of ECN to the network 
environment would not in any way increase the difficulty of designing 
and deploying such mechanisms. If anything, the addition of ECN to 
the architecture would make the job of identifying unresponsive flows 
slightly easier. For example, in an ECN-Capable environment routers 
are not limited to information about packets that are dropped or have 
the CE bit set at that router itself; in such an environment routers 
could also take note of arriving CE packets that indicate congestion 
encountered by that packet earlier in the path. 


Non-compliance in the Network 


The breakdown of effective congestion control could be caused not 
only by a non-compliant end-node, but also by the loss of the 
congestion indication in the network itself. This could happen 
through a rogue or broken router that set the ECT bit in a packet 
from a non-ECN-capable transport, or "erased" the CE bit in arriving 
packets. As one example, a rogue or broken router that "erased" the 
CE bit in arriving CE packets would prevent that indication of 
congestion from reaching downstream receivers. This could result in 
the failure of congestion control for that flow and a resulting 
increase in congestion in the network, ultimately resulting in 
subsequent packets dropped for this flow as the average queue size 
increased at the congested gateway. 


The actions of a rogue or broken router could also result in an 
unnecessary indication of congestion to the end-nodes. These actions 
can include a router dropping a packet or setting the CE bit in the 
absence of congestion. From a congestion control point of view, 
setting the CE bit in the absence of congestion by a non-compliant 
router would be no different than a router dropping a packet 
unecessarily. By "erasing" the ECT bit of a packet that is later 
dropped in the network, a router’s actions could result in an 
unnecessary packet drop for that packet later in the network. 


Concerns regarding the loss of congestion indications from 
encapsulated, dropped, or corrupted packets are discussed below. 
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1. Encapsulated packets 


Some care is required to handle the CE and ECT bits appropriately 
when packets are encapsulated and de-encapsulated for tunnels. 


When a packet is encapsulated, the following rules apply regarding 
the ECT bit. First, if the ECT bit in the encapsulated (’inside’ ) 
header is a 0, then the ECT bit in the encapsulating (’ outside’ ) 
header MUST be a 0. If the ECT bit in the inside header is a 1, then 
the ECT bit in the outside header SHOULD be a 1. 


When a packet is de-encapsulated, the following rules apply regarding 
the CE bit. If the ECT bit is a 1 in both the inside and the outside 
header, then the CE bit in the outside header MUST be ORed with the 
CE bit in the inside header. (That is, in this case a CE bit of 1 in 
the outside header must be copied to the inside header.) If the ECT 
bit in either header is a 0, then the CE bit in the outside header is 
ignored. This requirement for the treatment of de-encapsulated 
packets does not currently apply to IPsec tunnels. 


A specific example of the use of ECN with encapsulation occurs when a 
flow wishes to use ECN-capability to avoid the danger of an 
unnecessary packet drop for the encapsulated packet as a result of 
congestion at an intermediate node in the tunnel. This functionality 
can be supported by copying the ECN field in the inner IP header to 
the outer IP header upon encapsulation, and using the ECN field in 
the outer IP header to set the ECN field in the inner IP header upon 
decapsulation. This effectively allows routers along the tunnel to 
cause the CE bit to be set in the ECN field of the unencapsulated IP 
header of an ECN-capable packet when such routers experience 
congestion. 


-2. IPsec Tunnel Considerations 


The IPsec protocol, as defined in [ESP, AH], does not include the IP 
header’s ECN field in any of its cryptographic calculations (in the 
case of tunnel mode, the outer IP header’s ECN field is not 
included). Hence modification of the ECN field by a network node has 
no effect on IPsec’s end-to-end security, because it cannot cause any 
IPsec integrity check to fail. As a consequence, IPsec does not 
provide any defense against an adversary’s modification of the ECN 
field (i.e., a man-in-the-middle attack), as the adversary’s 
modification will also have no effect on IPsec’s end-to-end security. 
In some environments, the ability to modify the ECN field without 
affecting IPsec integrity checks may constitute a covert channel; if 
it is necessary to eliminate such a channel or reduce its bandwidth, 
then the outer IP header’s ECN field can be zeroed at the tunnel 
ingress and egress nodes. 
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The IPsec protocol currently requires that the inner header’s ECN 
field not be changed by IPsec decapsulation processing at a tunnel 
egress node. This ensures that an adversary’s modifications to the 
ECN field cannot be used to launch theft- or denial-of-service 
attacks across an IPsec tunnel endpoint, as any such modifications 
will be discarded at the tunnel endpoint. This document makes no 
change to that IPsec requirement. As a consequence of the current 
specification of the IPsec protocol, we suggest that experiments with 
ECN not be carried out for flows that will undergo IPsec tunneling at 
the present time. 


If the IPsec specifications are modified in the future to permit a 
tunnel egress node to modify the ECN field in an inner IP header 
based on the ECN field value in the outer header (e.g., copying part 
or all of the outer ECN field to the inner ECN field), or to permit 
the ECN field of the outer IP header to be zeroed during 
encapsulation, then experiments with ECN may be used in combination 
with IPsec tunneling. 


This discussion of ECN and IPsec tunnel considerations draws heavily 
on related discussions and documents from the Differentiated Services 
Working Group. 


3. Dropped or Corrupted Packets 


An additional issue concerns a packet that has the CE bit set at one 
router and is dropped by a subsequent router. For the proposed use 
for ECN in this paper (that is, for a transport protocol such as TCP 
for which a dropped data packet is an indication of congestion), end 
nodes detect dropped data packets, and the congestion response of the 
end nodes to a dropped data packet is at least as strong as the 
congestion response to a received CE packet. 


However, transport protocols such as TCP do not necessarily detect 
all packet drops, such as the drop of a "pure" ACK packet; for 
example, TCP does not reduce the arrival rate of subsequent ACK 
packets in response to an earlier dropped ACK packet. Any proposal 
for extending ECN-Capability to such packets would have to address 
concerns raised by CE packets that were later dropped in the network. 


Similarly, if a CE packet is dropped later in the network due to 
corruption (bit errors), the end nodes should still invoke congestion 
control, just as TCP would today in response to a dropped data 
packet. This issue of corrupted CE packets would have to be 
considered in any proposal for the network to distinguish between 
packets dropped due to corruption, and packets dropped due to 
congestion or buffer overflow. 
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A summary of related work. 


[Floyd94] considers the advantages and drawbacks of adding ECN to the 
TCP/IP architecture. As shown in the simulation-based comparisons, 
one advantage of ECN is to avoid unnecessary packet drops for short 
or delay-sensitive TCP connections. A second advantage of ECN is in 


avoiding some unnecessary retransmit timeouts in TCP. This paper 
discusses in detail the integration of ECN into TCP’s congestion 
control mechanisms. The possible disadvantages of ECN discussed in 


the paper are that a non-compliant TCP connection could falsely 
advertise itself as ECN-capable, and that a TCP ACK packet carrying 
an ECN-Echo message could itself be dropped in the network. The 
first of these two issues is discussed in Section 8 of this document, 
and the second is addressed by the proposal in Section 5.1.3 for a 
CWR flag in the TCP header. 


[CKLTZ97] reports on an experimental implementation of ECN in IPv6. 
The experiments include an implementation of ECN in an existing 
implementation of RED for FreeBSD. A number of experiments were run 
to demonstrate the control of the average queue size in the router, 
the performance of ECN for a single TCP connection as a congested 
router, and fairness with multiple competing TCP connections. One 
conclusion of the experiments is that dropping packets from a bulk- 
data transfer can degrade performance much more severely than marking 
packets. 


Because the experimental implementation in [CKLTZ97] predates some of 
the developments in this document, the implementation does not 
conform to this document in all respects. For example, in the 
experimental implementation the CWR flag is not used, but instead the 
TCP receiver sends the ECN-Echo bit on a single ACK packet. 


[K98] and [CKLTZ98] build on [CKLTZ97] to further analyze the 
benefits of ECN for TCP. The conclusions are that ECN TCP gets 
moderately better throughput than non-ECN TCP; that ECN TCP flows are 
fair towards non-ECN TCP flows; and that ECN TCP is robust with two- 
way traffic, congestion in both directions, and with multiple 
congested gateways. Experiments with many short web transfers show 
that, while most of the short connections have similar transfer times 
with or without ECN, a small percentage of the short connections have 
very long transfer times for the non-ECN experiments as compared to 
the ECN experiments. This increased transfer time is particularly 
dramatic for those short connections that have their first packet 
dropped in the non-ECN experiments, and that therefore have to wait 
six seconds for the retransmit timer to expire. 


The ECN Web Page [ECN] has pointers to other implementations of ECN 
in progress. 
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12. Conclusions 


Given the current effort to implement RED, we believe this is the 
right time for router vendors to examine how to implement congestion 
avoidance mechanisms that do not depend on packet drops alone. With 
the increased deployment of applications and transports sensitive to 
the delay and loss of a single packet (e.g., realtime traffic, short 
web transfers), depending on packet loss as a normal congestion 
notification mechanism appears to be insufficient (or at the very 
least, non-optimal). 
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Security Considerations 
Security considerations have been discussed in Section 9. 
IPv4 Header Checksum Recalculation 


IPv4 header checksum recalculation is an issue with some high-end 
router architectures using an output-buffered switch, since most if 
not all of the header manipulation is performed on the input side of 
the switch, while the ECN decision would need to be made local to the 
output buffer. This is not an issue for IPv6, since there is no IPv6 
header checksum. The IPv4 TOS octet is the last byte of a 16-bit 
half-word. 


RFC 1141 [RFC1141] discusses the incremental updating of the IPv4 
checksum after the TTL field is decremented. The incremental 
updating of the IPv4 checksum after the CE bit was set would work as 
follows: Let HC be the original header checksum, and let HC’ be the 
new header checksum after the CE bit has been set. Then for header 
checksums calculated with one’s complement subtraction, HC’ would be 
recalculated as follows: 
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HC’ = { HC - 1 HC > 
{ 0x0000 HC = 


For header checksums calculated on two’s complement machines, HC’ 
would be recalculated as follows after the CE bit was set: 


HC’ = { HC = 1 HC > 0 
{ OxFFFE HC = 0 
The motivation for the ECT bit. 


The need for the ECT bit is motivated by the fact that ECN will be 
deployed incrementally in an Internet where some transport protocols 
and routers understand ECN and some do not. With the ECT bit, the 
router can drop packets from flows that are not ECN-capable, but can 
*instead* set the CE bit in flows that *are* ECN-capable. Because the 
ECT bit allows an end node to have the CE bit set in a packet 
*instead* of having the packet dropped, an end node might have some 
incentive to deploy ECN. 


If there was no ECT indication, then the router would have to set the 
CE bit for packets from both ECN-capable and non-ECN-capable flows. 
In this case, there would be no incentive for end-nodes to deploy 
ECN, and no viable path of incremental deployment from a non-ECN 
world to an ECN-capable world. Consider the first stages of such an 
incremental deployment, where a subset of the flows are ECN-capable. 
At the onset of congestion, when the packet dropping/marking rate 
would be low, routers would only set CE bits, rather than dropping 
packets. However, only those flows that are ECN-capable would 
understand and respond to CE packets. The result is that the ECN- 
capable flows would back off, and the non-ECN-capable flows would be 
unaware of the ECN signals and would continue to open their 
congestion windows. 


In this case, there are two possible outcomes: (1) the ECN-capable 
flows back off, the non-ECN-capable flows get all of the bandwidth, 
and congestion remains mild, or (2) the ECN-capable flows back off, 
the non-ECN-capable flows don’t, and congestion increases until the 
router transitions from setting the CE bit to dropping packets. 
While this second outcome evens out the fairness, the ECN-capable 
flows would still receive little benefit from being ECN-capable, 
because the increased congestion would drive the router to packet- 
dropping behavior. 


A flow that advertised itself as ECN-Capable but does not respond to 
CE bits is functionally equivalent to a flow that turns off 
congestion control, as discussed in Sections 8 and 9. 


Ramakrishnan & Floyd Experimental [Page 20] 


RFC 2481 ECN to IP January 1999 


18. 


Thus, in a world when a subset of the flows are ECN-capable, but 
where ECN-capable flows have no mechanism for indicating that fact to 
the routers, there would be less effective and less fair congestion 
control in the Internet, resulting in a strong incentive for end 
nodes not to deploy ECN. 


Why use two bits in the IP header? 


Given the need for an ECT indication in the IP header, there still 
remains the question of whether the ECT (ECN-Capable Transport) and 
CE (Congestion Experienced) indications should be overloaded on a 
single bit. This overloaded-one-bit alternative, explored in 
[Floyd94], would involve a single bit with two values. One value, 
"ECT and not CE", would represent an ECN-Capable Transport, and the 
other value, "CE or not ECT", would represent either Congestion 
Experienced or a non-ECN-Capable transport. 


One difference between the one-bit and two-bit implementations 
concerns packets that traverse multiple congested routers. Consider 
a CE packet that arrives at a second congested router, and is 
selected by the active queue management at that router for either 


marking or dropping. In the one-bit implementation, the second 
congested router has no choice but to drop the CE packet, because it 
cannot distinguish between a CE packet and a non-ECT packet. In the 


two-bit implementation, the second congested router has the choice of 


either dropping the CE packet, or of leaving it alone with the CE bit 
set. 


Another difference between the one-bit and two-bit implementations 
comes from the fact that with the one-bit implementation, receivers 
in a single flow cannot distinguish between CE and non-ECT packets. 
Thus, in the one-bit implementation an ECN-capable data sender would 
have to unambiguously indicate to the receiver or receivers whether 
each packet had been sent as ECN-Capable or as non-ECN-Capable. One 
possibility would be for the sender to indicate in the transport 
header whether the packet was sent as ECN-Capable. A second 
possibility that would involve a functional limitation for the one- 
bit implementation would be for the sender to unambiguously indicate 
that it was going to send *all* of its packets as ECN-Capable or as 
non-ECN-Capable. For a multicast transport protocol, this 
unambiguous indication would have to be apparent to receivers joining 
an on-going multicast session. 


Another advantage of the two-bit approach is that it is somewhat more 
robust. The most critical issue, discussed in Section 8, is that the 
default indication should be that of a non-ECN-Capable transport. In 
a two-bit implementation, this requirement for the default value 
simply means that the ECT bit should be ‘OFF’ by default. In the 
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one-bit implementation, this means that the single overloaded bit 
should by default be in the "CE or not ECT" position. This is less 
clear and straightforward, and possibly more open to incorrect 
implementations either in the end nodes or in the routers. 


In summary, while the one-bit implementation could be a possible 
implementation, it has the following significant limitations relative 
to the two-bit implementation. First, the one-bit implementation has 
more limited functionality for the treatment of CE packets at a 
second congested router. Second, the one-bit implementation requires 
either that extra information be carried in the transport header of 
packets from ECN-Capable flows (to convey the functionality of the 
second bit elsewhere, namely in the transport header), or that 
senders in ECN-Capable flows accept the limitation that receivers 
must be able to determine a priori which packets are ECN-Capable and 
which are not ECN-Capable. Third, the one-bit implementation is 
possibly more open to errors from faulty implementations that choose 
the wrong default value for the ECN bit. We believe that the use of 
the extra bit in the IP header for the ECT-bit is extremely valuable 
to overcome these limitations. 


Historical definitions for the IPv4 TOS octet 


RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP 
header. In RFC 791, bits 6 and 7 of the ToS octet are listed as 
"Reserved for Future Use", and are shown set to zero. The first two 
fields of the ToS octet were defined as the Precedence and Type of 
Service (TOS) fields. 


0 1 2 3 4 5 6 7 
4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- + 
| PRECEDENCE | TOS | o | o | RFC 791 
4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- + 


RFC 1122 included bits 6 and 7 in the TOS field, though it did not 
discuss any specific use for those two bits: 


0 il 2 3 4 5 6 7 
4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- + 
| | PRECEDENCE | TOS | RFC 1122 
4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- + 


The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows: 


0 1 2 3 4 5 6 7 
4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- + 
| | PRECEDENCE | TOS | MBZ RFC 1349 
4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- + 
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Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary 
Cost". In addition to the Precedence and Type of Service (TOS) 
fields, the last field, MBZ (for "must be zero") was defined as 
currently unused. RFC 1349 stated that "The originator of a datagram 
sets [the MBZ] field to zero (unless participating in an Internet 
protocol experiment which makes use of that bit)." 


RFC 1455 [RFC 1455] defined an experimental standard that used all 
four bits in the TOS field to request a guaranteed level of link 
security. 


RFC 1349 is obsoleted by "Definition of the Differentiated Services 
Field (DS Field) in the IPv4 and IPv6 Headers" [DIFFSERV], in which 
bits 6 and 7 of the DS field are listed as Currently Unused (CU). 
The first six bits of the DS field are defined as the Differentiated 
Services CodePoint (DSCP): 


0 1 2 3 4 5 6 7 
+----—- +----—- +----—- +----—- 4+----- 4+----- 4+----- 4+----- + 
| DSCP | CU 
+----—- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- 4+----- + 


Because of this unstable history, the definition of the ECN field in 
this document cannot be guaranteed to be backwards compatible with 
all past uses of these two bits. The damage that could be done by a 
non-ECN-capable router would be to "erase" the CE bit for an ECN- 
capable packet that arrived at the router with the CE bit set, or set 
the CE bit even in the absence of congestion. This has been 
discussed in Section 10 on "Non-compliance in the Network". 


The damage that could be done in an ECN-capable environment by a 
non-ECN-capable end-node transmitting packets with the ECT bit set 
has been discussed in Section 9 on "Non-compliance by the End Nodes". 
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This document and translations of it may be copied and furnished to 
others, and derivative works that comment on or otherwise explain it 
or assist in its implementation may be prepared, copied, published 
and distributed, in whole or in part, without restriction of any 
kind, provided that the above copyright notice and this paragraph are 
included on all such copies and derivative works. However, this 
document itself may not be modified in any way, such as by removing 
the copyright notice or references to the Internet Society or other 
Internet organizations, except as needed for the purpose of 
developing Internet standards in which case the procedures for 
copyrights defined in the Internet Standards process must be 
followed, or as required to translate it into languages other than 
English. 


The limited permissions granted above are perpetual and will not be 
revoked by the Internet Society or its successors or assigns. 


This document and the information contained herein is provided on an 
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
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